Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Delete forecast API (#31134) #33218

Merged
merged 12 commits into from
Sep 4, 2018

Conversation

benwtrent
Copy link
Member

Integration tests added to test the action. Verified that the API is published appropriately and works manually as well.

Addresses feature request: (#31134)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left some minor comments and a bigger one about whether we should handle deletion of multiple forecasts at once.

@@ -161,7 +161,8 @@
public static final String REST_JOB_NOT_CLOSED_REVERT = "Can only revert to a model snapshot when the job is closed.";
public static final String REST_NO_SUCH_MODEL_SNAPSHOT = "No model snapshot with id [{0}] exists for job [{1}]";
public static final String REST_START_AFTER_END = "Invalid time range: end time ''{0}'' is earlier than start time ''{1}''.";

public static final String REST_NO_SUCH_FORECAST = "No forecast with id [{0}] exists for job [{1}]";
public static final String REST_BAD_FORECAST_STATE = "Forecast [{0}] for job [{1}] needs to be either FAILED or FINISHED to be deleted";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name could be more descriptive here. Something like REST_CANNOT_DELETE_FORECAST_IN_CURRENT_STATE.

return;
}

if (DELETABLE_STATUSES.contains(forecastRequestStats.getStatus())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could reduce the indentation here by checking if the status is not deletable first, then proceed in doing the deletion. I leave it on your preference whether you want to change it.

DeleteByQueryRequest deleteByQueryRequest = buildDeleteByQuery(jobId, forecastId);
executeAsyncWithOrigin(client, ML_ORIGIN, DeleteByQueryAction.INSTANCE, deleteByQueryRequest, ActionListener.wrap(
response -> {
if (response.getDeleted() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle error paths here. We need to check if the request timed out and we need to check if we got any bulk failures. I noticed we don't do that from the expired data removers, but arguably we should also be doing better error checking in those.

@Override
protected RestChannelConsumer prepareRequest(RestRequest restRequest, NodeClient client) throws IOException {
String jobId = restRequest.param(Job.ID.getPreferredName());
String forecastId = restRequest.param(Forecast.FORECAST_ID.getPreferredName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the delete may take a while, we should be allowing for timeouts. See datafeed deletion as an example.

public static class Request extends ActionRequest {

private String jobId;
private String forecastId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrary to the other delete requests we currently have, this one deletes results instead of resources. I can imagine the UI allowing users to select multiple forecasts and then delete them in bulk. It would be a pain for the UI to have to perform a request per forecast. I think we should consider allowing this to handle deletion of multiple forecasts. Users can pass a comma separated list of forecast IDs.

@@ -276,6 +278,55 @@ public void testOverflowToDisk() throws Exception {

}

public void testDelete() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add a YML test in forecast.yml. Ping me if you need a tutorial on how those work.

@dimitris-athanasiou dimitris-athanasiou changed the title Delete forecast API (#31134) [ML] Delete forecast API (#31134) Aug 29, 2018
@dimitris-athanasiou
Copy link
Contributor

Also, don't forget to add an entry to add this API to the client once merged in.

@benwtrent
Copy link
Member Author

@dimitris-athanasiou most definitely :)

.minimumShouldMatch(1)
.must(QueryBuilders.termsQuery(Result.RESULT_TYPE.getPreferredName(), ForecastRequestStats.RESULT_TYPE_VALUE))
.should(QueryBuilders.boolQuery()
.must(QueryBuilders.termQuery(Job.ID.getPreferredName(), jobId))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the forecastId is not _all, I think we should add a terms query here with the requested forecast IDs. It makes this more efficient, it saves having to filter the forecast IDs later on and it dodges the 10K limit in the odd case.

.should(QueryBuilders.boolQuery()
.must(QueryBuilders.termQuery(Job.ID.getPreferredName(), jobId))))
.size(MAX_FORECAST_TO_SEARCH);
SearchRequest searchRequest = new SearchRequest(RESULTS_INDEX_PATTERN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should set AnomalyDetectorsIndex.jobResultsAliasedName. The aliases contain a filter on the job_id which means we won't match forecasts of other jobs. This is important to dodge the 10K limit.

ActionListener<SearchResponse> forecastStatsHandler = ActionListener.wrap(
searchResponse -> deleteForecasts(searchResponse, request, listener),
e -> listener.onFailure(new ElasticsearchException("An error occurred while searching forecasts to delete", e)));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the forecast_id is _all, we need to check the cluster setting action.destructive_requires_name (see https://www.elastic.co/guide/en/elasticsearch/guide/current/_deleting_an_index.html). If that is true, we shouldn't allow _all.

e -> listener.onFailure(new ElasticsearchException("An error occurred while searching forecasts to delete", e)));

SearchSourceBuilder source = new SearchSourceBuilder();
source.query(QueryBuilders.boolQuery()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query here should be in a filter context (see https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html). When we are not interested in scoring, we should be using the filter context.

.minimumShouldMatch(1)
.must(QueryBuilders.termsQuery(Result.RESULT_TYPE.getPreferredName(), ForecastRequestStats.RESULT_TYPE_VALUE))
.should(QueryBuilders.boolQuery()
.must(QueryBuilders.termQuery(Job.ID.getPreferredName(), jobId))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be necessary when we use the job results index alias (see below).

new TimeoutException("Unable to delete all requested forecasts. Deleted a total of " + response.getDeleted()));
return;
}
if (response.getBulkFailures().isEmpty() && response.getSearchFailures().isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking there are no failures. It should be changed to check there are failures.

DeleteByQueryRequest request = new DeleteByQueryRequest(searchRequest)
.setAbortOnVersionConflict(false)
.setSize(MAX_FORECAST_TO_SEARCH)
.setSlices(5);

searchRequest.indices(RESULTS_INDEX_PATTERN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, use job results index alias here.

.must(QueryBuilders.termQuery(Job.ID.getPreferredName(), jobId))
.must(QueryBuilders.termQuery(Forecast.FORECAST_ID.getPreferredName(), forecastId)));
.must(QueryBuilders.termQuery(Forecast.FORECAST_ID.getPreferredName(), forecastToDelete)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use a termsQuery instead where you pass all the IDs and you don't have to do the for loop.

"allow_no_forecasts": {
"type": "boolean",
"required": false,
"description": "Whether to ignore if `_all` or `*` matches no forecasts"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also remove * from here.

- do:
headers:
Authorization: "Basic eF9wYWNrX3Jlc3RfdXNlcjp4LXBhY2stdGVzdC1wYXNzd29yZA==" # run as x_pack_rest_user, i.e. the test setup superuser
xpack.ml.post_data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this setup being necessary for any of the tests. However, it would be nice to test the delete actually removes forecasts. You will need to index a forecast manually (a forecast stats doc plus a couple of forecast docs). We follow a similar approach in the YAML tests of the results.

return;
}
if (response.getBulkFailures().isEmpty() && response.getSearchFailures().isEmpty()) {
if ((response.getBulkFailures().isEmpty() && response.getSearchFailures().isEmpty()) == false) {
Tuple<RestStatus, Throwable> statusAndReason = getStatusAndReason(response);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed that we should ignore version conflict exceptions as they imply the document has already been deleted. Won't those be included in the bulk failures? If yes, I don't see where we filter them out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitris-athanasiou I tested with ignore false and true. When the ignore is true, no exception is thrown due to version conflict, however, if the ignore is false, I do get an exception thrown. So, I don't think any filtering is necessary from my testing.

I added a test that fails when that field is false (as it does not expect an exception), and passes when true so that we can be alerted through regression if this behavior changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which ignore are you referring to?


if (MetaData.ALL.equals(forecastsExpression) || Regex.isMatchAllPattern(forecastsExpression)) {
return new HashSet<>(allStats);
XContentParser parser = XContentFactory.xContent(XContentType.JSON).createParser(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser should be created in a try-with-resource.

@dimitris-athanasiou
Copy link
Contributor

Almost there! Left 2 more comments that for some reason github shows as outdated.

XContentParser parser = XContentFactory.xContent(XContentType.JSON).createParser(
NamedXContentRegistry.EMPTY, DeprecationHandler.THROW_UNSUPPORTED_OPERATION, hit.getSourceRef().streamInput());
allStats.add(ForecastRequestStats.STRICT_PARSER.apply(parser, null));
try (InputStream stream = hit.getSourceRef().streamInput();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we don't need the stream in a separate variable. The parser owns it and will handle closing the stream.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit 767d8e0 into elastic:master Sep 4, 2018
@benwtrent benwtrent deleted the feature/delete-forecast-api branch September 4, 2018 00:06
benwtrent added a commit that referenced this pull request Sep 4, 2018
jasontedor added a commit that referenced this pull request Sep 4, 2018
@jasontedor
Copy link
Member

I reverted this commit from 6.x via bf01cda because it broke compilation there.

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Sep 4, 2018
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 4, 2018
…e-default-distribution

* elastic/master: (213 commits)
  ML: Fix build after HLRC change
  Fix inner hits retrieval when stored fields are disabled (_none_) (elastic#33018)
  SQL: Show/desc commands now support table ids (elastic#33363)
  Mute testValidateFollowingIndexSettings
  HLRC: Add delete by query API (elastic#32782)
  [ML] The sort field on get records should default to the record_score (elastic#33358)
  [ML] Minor improvements to categorization Grok pattern creation (elastic#33353)
  [DOCS] fix a couple of typos (elastic#33356)
  Disable assemble task instead of removing it (elastic#33348)
  Simplify the return type of FieldMapper#parse. (elastic#32654)
  [ML] Delete forecast API (elastic#31134) (elastic#33218)
  Introduce private settings (elastic#33327)
  [Docs] Add search timeout caveats (elastic#33354)
  TESTS: Fix Race Condition in Temp Path Creation (elastic#33352)
  Fix from_range in search_after in changes snapshot (elastic#33335)
  TESTS+DISTR.: Fix testIndexCheckOnStartup Flake (elastic#33349)
  Null completion field should not throw IAE (elastic#33268)
  Adds code to help with IndicesRequestCacheIT failures (elastic#33313)
  Prevent NPE parsing the stop datafeed request. (elastic#33347)
  HLRC: Add ML get overall buckets API (elastic#33297)
  ...
benwtrent added a commit that referenced this pull request Sep 4, 2018
* [ML] Delete forecast API (#31134) (#33218)

* Delete forecast API (#31134)

* Adjust for backport

* removing bad import
benwtrent added a commit that referenced this pull request Sep 5, 2018
* [ML] Delete forecast API (#31134) (#33218)

* Delete forecast API (#31134)

* Adjust for backport

* removing bad import

* Fixing delete forecast action
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants